Eagle is a series of vision-centric high-resolution multimodal large language models, enhancing the perception capabilities of multimodal LLMs by hybridizing visual encoders from different architectures and knowledge domains.
Image-to-Text
Transformers